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While least-squares fitting procedures are commonly used in data analysis and 
are extensively discussed in the literature devoted to this subject, the proper as- 
sessment of errors resulting from such 6ts has received relatively little attention. 
The present work considers statistical errors in the fitted parameters, as well as in 
the values of the fitted function itself, resulting from random errors in the data. 
Expressions are derived for the standard error of the 6t, as a function of the inde- 
pendent variable, for the general nonlinear and linear fitting problems. Additionally, 
closed-form expressions are derived for some examples commonly encountered in the 
scientific and engineering fields, namely, ordinary polynomial and Gaussian fitting 
functions. These results have direct application to the assessment of antenna gain 
and system temperature characteristics, in addition to a broad range of problems 
in data analysis. The effects of the nature of the data and the choice of fitting func- 
tion on the ability to accurately model the system under study are discussed, and 
some general rules are deduced to assist workers intent on maximizing the amount 
of information obtained from a given set of measurements. 


I. Summary 

The fitting of data of the form (xi,yi),i — 1,2 by a function y(x; Oi, • • • ,a«) = y{x\ a), de- 

pending on M coefficients, aj, and the independent variable x, is common in scientific and engineering 
work. The procedure most often used for optimizing the coefficients in order to obtain the best fit is the 
least-squares method, in which the quantity 


**(«)=£; i* - ,( ?‘ ;a)ia 


is minimized, where <7j is the standard deviation of the random errors of y t , which we assume to be 
normally distributed. 

The result of such a fitting procedure is the function y(x; ao), where ao is the coefficient vector that 
minimizes X 2 ( a )i an d the question arises as to what standard error to associate with the values of this 
resulting fit. Standard references on statistics and data analysis give the well-known result that the 
variances of the coefficients, a,j, are given by the diagonal elements of the covariance matrix, C, i.e., 
= Cjj, where C is the inverse of the matrix H, variously referred to as the curvature or Hessian 
matrix. While it is often useful to know what the parameter errors are, especially if the parameters 
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themselves are related to some underlying physical model of the process under study, this does not tell 
one directly what the error is in the values of the fitting function itself, and a knowledge of this error, which 
is a function of the independent variable, is frequently of value in characterizing the system performance. 

Lacking a general discussion of this in the literature, it seems that various workers assume a mean 
error equal to either the rms value of the data errors or 1 /y/N times this. It is shown in the present 
work, however, that for the general least-squares fit, the weighted mean value of the variance of the fit, 
averaged over the data points x = x iy is given by 

1 y, crfai) M 

N ^ a? ~ A 
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so that for constant data errors, 
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Thus, the mean standard error depends on the order of the fit, increasing as the square root of this value. 

The error in the value of the fitted function, however, always depends on x, even when the standard 
deviations of the data errors, a iy are all the same, independent of x. An analysis of these errors leads to 
the general result that the variance of the value of the fitted function, resulting from the random data 
errors, is given by 


M M 

ffjj(x) = d ( x ) Tc d ( x ) 


3=1 k = 1 


where [d(x)]j = dj( x) = [dy(x; a)/<9a.,']|a () and T implies matrix transpose. For the special case of linear 
fitting, where y(x; a) = J2j=i a jXj( x), this becomes 

M M 

= ^Yl C i kX A x ) Xk ( x ) = x(x) T C x(x) 

j = l k=l 


where x(x) is a column vector whose elements are Xj(x). An example of the application of this result to 
a set of antenna aperture efficiency versus elevation data is shown in Figs. 1 through 4. 

For the important class of basis functions corresponding to ordinary polynomials, Xj(x) = x J_1 , it 
is shown that if the data are uniformly distributed along the x-axis and the data standard errors are 
constant, cr, = a, then simple, closed-form expressions can be derived for <Jy{x). Thus, we find 

M — 1 

v 2 = E 

3=0 
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ELEVATION, deg 


Fig. 1. Quadratic fit to antenna aperture 
efficiency versus elevation data showing the 
confidence limits corresponding to 68.3 percent 
(±Oy(x)). The data standard errors were 
constant and equal to 0.91 efficiency percent, 
and the computed reduced x 2 was 1 .06. 



ELEVATION, deg 

Fig. 3. The standard error of the fit 
corresponding to Fig. 1 for the range of 
elevation over which the data exist. 



Fig. 2. The same as Fig. 1 except the fit and 
limits are extended beyond the existing data to 
illustrate the effect of the rapid increase in the 
error of the fit. 



Fig. 4. The same as Fig. 3 except the range of 
elevation is extended beyond the existing data. 


where rj = \fN([a y (x)\/<j), £ = [{x-x)/a x ], x = (1/JV) £*1 x u a 2 x = (1 /N) 

and the coefficients are listed in Table 1 for the cases M = 2,3,4, corresponding to straight- 

line, quadratic, and cubic fits. For the straight-line fit, the coefficients appearing in the above expression 
are independent of the number of data points, N, while for the quadratic and cubic cases they become 
independent for reasonable values of N, say, N > 10. These results are summarized graphically with the 
set of universal error curves shown in Fig. 5. 


109 





Table 1. The coefficients in the equation for the squared normalized standard 
error of the fit for straight-line, quadratic, and cubic fits. 3 
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Fig. 5. Universal, normalized error curves for 
straight-line, quadratic, and cubic fits for 
constant, normally distributed data errors, o; = 
o, with N uniformly distributed data points. The 
symbols associated with each curve 
correspond to the results of Monte Carlo 
calculations carried out as a check (see 
Appendix for details). 


As an example of a similar development for nonlinear fitting, the case of a Gaussian function given by 


y(x\ a), = a j exp 


~{x - a 2 ) 2 ' 
2 a\ 


is treated exactly, and it is shown that for uniformly distributed data points located symmetrically relative 
to the peak, and constant data errors, 
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where a t = {^1/^3), while if the data errors are proportional to the value of the function, a(x) oc y{x\ a), 
one finds 


rj 1 — N 


Qyjx) 

a(x) 



where in both cases it is assumed that the number of data points, N, is reasonably large, of the order of 
20 or more, and in the former case, it is also assumed that the spread of the data points, L, is greater 
than about ±203. These results are shown graphically in Figs. 6 and 7 . 



Fig. 6. Universal, normalized error 
curves for the general Gaussian function 
y(x;a) = a-\ exp[-(x-a 2 ) 2 / 2 a 3 ] for constant, 
normally distributed data errors, 0/ = a, with N 
uniformly distributed data points centered on 
the peak of the Gaussian. The symbols 
associated with each curve correspond to the 
results of Monte Carlo calculations carried out 
as a check (see Appendix for details). 



x-x 

Ox 


Fig. 7. The same as Fig. 6 except that the data 
errors are now proportional to the function, a/ = 
o(x/) oc y(xf, a). The symbols associated with 
each curve correspond to the results of Monte 
Carlo calculations carried out as a check (see 
Appendix for details). 


Another important aspect of the general least-squares fitting problem is the optimization of the sam- 
pling during the taking of data, e.g., what spacing should one use along the x-axis and how many points 
should one use in order to reduce the parameter errors to acceptable levels? Since the parameter errors 
for the case of polynomial fits depend sensitively on the location of the origin of the x-scale, and in any 
event the coefficients themselves are unlikely to have a fundamental significance in terms of an underlying 
physical model of the process under study, we restrict ourselves to a consideration of Gaussian fits as an 
example of some practical importance. 


Thus, for uniformly and symmetrically distributed data points, we find the following. For constant 
data errors <7 i = c, 
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where the sums 5o, S 2 , and S 4 are given by Eq. (A-7) of the Appendix and 6 2 = [12/(Af 2 — l)]^*/ 0 !) = 
[12/(A^ 2 — l)]er 2 . The normalized errors %/AT(cr 0l / ct) , \/iV[((T a 2 /a 3 )/(< 7 /ai)], and v^V[(era 3 /a 3 )/(< 7 /ai)] are 
plotted as functions of the variable Z//a 3 for selected values of the number of data points, AT, in Fig. 8 . 
In the limit N — > 00 and L/a 3 > 4, the above normalized standard errors become 
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For data errors proportional to the function, cq = Py{x\ a), 
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II. Introduction 

If one measures a single quantity N times in the presence of normally distributed, random errors, 
then it is well known that the variance of the mean of these measurements is equal to the variance of 
the measurements themselves divided by N, and in the absence of systematic errors, the mean value 
approaches the true value of the quantity as the number of measurements increases without limit. 
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Fig. 8. Universal, normalized error curves for the three parameters of a Gaussian fit for constant data errors, 
o/= o, as a function of the normalized x-axis interval ua$ for various values of the number of data points, N: 
(a) a-|,(b) 82, and (0)83. 


In the case of least-squares fitting of a given function to a given set of data that are likewise subject 
to normally distributed, random errors, the resulting fit is the mean function corresponding to the data, 
and the question arises as to what variance to assign to the errors of the values of this function. Here, two 
related concerns arise. First, the fitting function will contain a certain number of parameters, M, and one 
or more of these may be of interest in relation to a physical quantity whose value is sought. For example, 
system noise temperatures may have been recorded as an antenna is scanned through a point source of 
radiation, and one may be interested in the peak value or the half-width of the antenna pattern or both. 
Or, one may have determined a series of system noise temperatures at different elevations and wish to 
know what the maximum system noise temperature is and at what elevation it occurs. In each case, an 
appropriate fitting function could be chosen so that one or more of the parameters involved corresponds 
to the quantity or quantities of interest, and one would like to know, therefore, what standard error to 
assign to the quantities so determined. Alternatively, one may want to know what the standard error is 
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as a function of the independent variable, say declination or elevation in the two examples cited above. 
In the remainder of this article, we will refer to this standard error of the value of the fitted function as 
the error of the fit and designate it by the symbol a y (x). 

The first instance considered above, namely, determining the error of one or more fitting parameters, 
has a straightforward answer given in terms of the diagonal elements of the covariance matrix of the fit, 
and is well known. Less well known, however, particularly among nonmathematicians, is the relationship 
between this matrix and the error of the fit as a function of the independent variable. Some insight 
into this problem can be obtained by examining Fig. 9, where we show the results of sequentially fitting 
straight lines to a series of data sets generated by the same linear function, y = aj + a^x, but with 
different random errors, corresponding, however, to the same normal distribution, i.e., having the same 
constant standard deviation, cr. Since each of these lines could have resulted from the same underlying 
function, albeit with different probability, the ensemble of all possible lines defines the error statistics of 
the particular fit actually obtained with the particular set of errors present during the data gathering, 
i.e., the data errors actually obtained correspond to but one of the infinite number of sets that could have 
resulted from the measurements. In the case shown in Fig. 9, one can see that the error of the fit tends to 
be smaller toward the centroid of the data points and larger at the extremes. In fact, it is shown below 
that the error curve in this case is actually a hyperbola and that the same general behavior is found 
for higher-degree polynomials, albeit with differing functional dependence. The standard error of the fit, 
<J y (x), derived below is shown superimposed on the ensemble of straight lines in Fig. 9. 

It is the purpose of this article to discuss the above errors and, in particular, to present results that 
will permit one to determine the standard error of the fit as a function of the independent variable, as 
well as to establish confidence limits for these errors. 

1.6 
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X 
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Fig. 9. Plot of 20 straight lines resulting from fits to 
data generated from the same parent straight line but 
with different random, normally distributed errors 
having the same statistics. 



III. Least-Squares Fitting 

The fitting of data of the form (xj,?/j), i = 1, 2, • • • , N by a function y{x\ oi, ■ • ■ , a^) = r/(x; a) depend- 
ing on M coefficients, aj, and the independent variable x is frequently used in scientific and engineering 
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work, either to determine the most likely values of the fitting coefficients, which may relate to some 
physically reasonable model of the process under study, or simply to permit the prediction of the most 
likely value of the dependent variable y for a given value of x , including those values where no data exist. 

While many techniques for finding optimum values of the fitting coefficients exist, the one most com- 
monly used is the least-squares method, where the coefficients are determined by minimizing the quantity 
X 2 , given by 

*’(a) s 

i=l 1 

The variances of of the data values yi are assumed known, either from a knowledge of the experimental 
errors involved in the measurements or from analysis of the data itself, and it is assumed that the errors 
themselves are normally distributed. While this latter requirement is not essential for the derivations 
that follow, it is nonetheless a common assumption, valid for most measurements, and permits one to 
establish confidence limits, as discussed below. 

This approach, also called the method of maximum likelihood, is described in numerous publications 
and is the basis of many curve-fitting programs available in various software packages devoted to data 
analysis. However, in spite of its widespread use, there are a number of aspects of the least-squares fitting 
problem that are not often discussed in the literature on the subject, especially those having to do with the 
proper evaluation and interpretation of errors. In the following section, the general linear, least-squares 
problem, in which the fitting coefficients aj enter into the fitting function y(x; a) in a linear manner, is 
formulated, and the solution for the coefficients is obtained. The purpose here is to establish the notation 
and display the main results, rather than to provide a detailed derivation. 1 This is followed by an analysis 
of standard errors for the various quantities encountered and a section devoted to illustration of the main 
ideas through consideration of some simple examples. Next, the general nonlinear, least-squares fitting 
problem, in which the coefficients aj enter into the fitting function y{x\ a) in a nonlinear way, is discussed, 
and the standard errors once again analyzed. Two final sections provide a discussion of a related problem 
involving errors of the difference between a function at two values of the argument and a general discussion 
of the significance of the results. 

IV. The Linear Least-Squares Problem 

For the linear, least-squares problem, the fitting function y(x- a) may be written 

M 

y{x', a ) = ' 52 a j X ] (x) 
j=i 

where Xj(x) are arbitrary basis functions of the independent variable, x, and the M coefficients, aj, 
are to be determined by minimizing x 2 , as given by Eq. (1). This problem is most often framed in the 
language of linear algebra, where the fitting function y(x *;a) is taken as an iV-element column vector, 
the coefficients aj as an M-element column vector, and the basis functions Xj(x,) are represented by an 
N x M matrix Xij, so that letting x — X{, Eq. (2) may be written 

M 

y(xi , = ^ ' XijOj , i = 1,2,---, 

i'=i 

1 The latter may be found in numerous sources, such as [1], which has been used as a guide to define the notation. 
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or, in the matrix equation form, y = Xa. 

Similarly, by defining the column vector b, = {yi/a l ) and the matrix A ^ = (Xy /<**), Eq. (1) may be 
written 

X 2 = (b - Aa) T (b - Aa) (3) 


where T implies matrix transpose. Thus, the extremum condition for the coefficients, 


results in the matrix equation 


d__ 

daj 


(x 2 ) = o 


(A t A) a = A r b 


having the solution 


a = (A t A) 1 A T b = CA r b (4a) 

In the above, we have defined the matrix, H = A r A, and its inverse, C. The symmetric M x M 
matrix C = H 1 is called the covariance matrix and is central to the determination of standard errors, 
as discussed below. In component form, Eq. (4a) may be written 

M N V (t ) 

<*j = oj(yi,-”,»w) = Y, c i k Y,y i ~ ( 4b ) 

*=1 i=l * 

where the dependence of the coefficients on the measured data has been made explicit in the notation. 

Before proceeding with a discussion of errors and their evaluation, the question of the suitability of 
the set of basis functions chosen at the outset should be addressed. It is shown in various treatises on 
statistics and data analysis that the resulting fit to the function y(x; a) is meaningful if x 2 , 38 given by 
Eq: (1), is of the order of N - M = i/, referred to as the number of degrees of freedom for the system 
[2]. Since the quantity x 2 should be reasonably close to v for a good fit, the related quantity Xu = x/v, 
called the reduced x 2 , is often used as a measure of the suitability of the chosen basis functions for fitting 
to the given data, i.e., the condition x 2 ~ 1 is taken to indicate that the fit is meaningful. For v » 1, 
X 2 , and hence the r educed x 2 ; is normally distributed, with the latter having a mean of 1 and a standard 
deviation of y/2 jv. 

Figure 10 illustrates the significance of the quantity x 2 f° r the simple case of a polynomial fit. In 
Fig. 10(a), the data clearly show a quadratic. dependence on the independent variable, but an attempt 
has Been made, to obtain a fit with a straight line. The formal application of Eq. (4a.) : results in the 
computation of two coefficients for the line, but since at least three coefficients are obviously required to 
give a reasonable fit to the given data, these coefficients are meaningless, both in terms. of providing a 
decent fit and in terms of the model used to represent the data, and this is manifest in the large value of 
xl resulting from the fit. On the. other hand, increasing the degree of the polynomial to 2 by adding a 
quadratic term results in a much more reasonable-looking fit and a value of x 2 close to 1. 




p 

Fig. 10. The significance of the quantity Xv for a polynomial fit: (a) a straight-line fit to data having a 
quadratic dependence on the independent variable, illustrating the increase in Xv resulting from a 
poor fit to the data, and (b) a straight-line fit to straight-line data. In each case, the constant standard 
deviation of the data errors, a, is relative to the parent curve, a quadratic for Fig. 10(a) and a straight 
line for Fig. 10(b) (see Section IV for discussion). 


It must be pointed out, however, that the value of xl that one obtains with a given fitting function 
depends not only on the suitability of the function chosen but also on the values of the data errors 
assumed, as is evident from the definition given by Eq. (1). Thus, in Fig. 10(b), where a straight-line fit 
has been obtained for straight-line data, an accurate knowledge of the data errors, <7*, leads to a reduced 
X 2 of 0.9. However, had the data errors in Fig. 10(a) been overestimated, as could result with an overly 
cautious experimenter, for example, then the calculated xl would be smaller than it should be, so that 
one might end up with a value close to 1 as the result of a relatively poor fit and excessively large cr,’ s. 
This points up the importance of properly assessing the data errors, a subject we will return to in a later 
section. 


V. Standard Errors 

Since the coefficient values a,j depend on the data values as given by Eq. (4b), or its matrix 
equivalent, Eq. (4a), the uncertainties in the coefficients depend on the uncertainties in the data values. 
Thus, 
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or, in matrix form, 


6a = CA r <5b 


so that the covariance of aj and Ofc is given by 
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where ( ) indicates an ensemble average, giving the expectation value of the quantity in parenthesis. In 
matrix form, 


al = (Sa6a T ) = (A T <5b) (A T (5b) T C T J> = CA T («5b(5b T ) AC T (5b) 


where use has been made of the relation ( A T 6b) T = Sb f A. Since the data errors Sy t are assumed to be 
statistically uncorrelated, ( Syrfyi ) = &uo f, where 6u is the Kronecker delta. 

Thus, (<5b<5b T ) = I, the identity matrix, so that Eq. (5b) becomes 

<j\ = C(A t A)C t = C (6 a) 


where the definition of the matrix C = (A r A) 1 [see Eq. (4a)] and its symmetry have been used. In 
component form, 


°l jk = C jk (6b) 

so that, unlike the data errors, the resulting errors in the coefficients are correlated, i.e., the off-diagonal 
terms of the matrix C do not, in general, vanish. The diagonal elements give the variances of the 
coefficients, 


^ = C jj (6c) 

We are now in a position to determine the error in the value of the fitted function y(x) that results from 
the errors in the coefficients aj. 

From Eq. (2), 


M 

Sy(x;a) = 

j = i 

so that the covariance <jy(x,x') = ( Sy(x)6y(x ')) is given by 

MM MM 

a 2 y (x, X 1 ) = Y,( 6a M X ^ X ^ = E E CjkXjtoXktf) (7) 

j= 1 A:=l j=l k— 1 

We note that this holds for any two values of the independent variable, not just the data points, and that 
it is independent of the parameter values, a 3 . 

For the special case x' = x, we have the variance 

M M 

C jk X j (x)X k (x) = x(x) T C x(x) (8) 

j=l k = l 
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where x(x) is a column vector whose elements are Xj{x). Since the above expression for (Ty(x, x') does 
not, in general, vanish for x' / x, the errors in y(x; a) at two different values of x are correlated, unlike 
the data errors. Some insight into the significance of the result given by Eq. (8) can be obtained by 
considering the weighted mean value of cr^(x) averaged over all data points x = x*: 
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From the definition of the matrix A, ( Xij/oi ) = Aij, and hence the sum over i is just (A T A )kj = h^j = 
(C~ l )kj, thus giving us 
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N ^ at 
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( 9 ) 


For the special case of constant data errors, independent of the value of x, <Tj = a — constant, so that we 
obtain 
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2 

y 


ill >^) 



( 10 ) 


This result is analogous to the variance of the mean for the case of one value of x, where M = 1. Thus, 
we see that, as a result of fitting to the function y(x), the variance of y, averaged over the data points, is 
reduced by the ratio M/N relative to the constant data variance. This implies that the higher the order 
of the fit required by the data to reduce xl to a value close to 1 , the larger will be the resulting errors of 
the values of the fitted function. 

In the following section, we examine these results for the simple case of a straight-line fit to the data, 
where an exact analysis enables one to bring out some of the more important consequences of the general 
theory. 


VI. Illustrative Examples 

A. Fitting to a Straight Line 

In order to illustrate some of the main consequences of the theory developed above, we first consider 
the simplest linear, least-squares fitting problem, that of a straight line [3], for which M = 2. Then, the 
basis functions are given by Xij — x| -1 , j = 1,2, and the fitting function by 


2 

y(xi; a) = dj-xf -1 = ai + a^Xi 

3 = 1 


The matrix H is, thus, 
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while its inverse, C, is 


C 
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^■22 

~/l21 


-^12 

/ill 


In the above, A = hn/122 — h 2 2 and /112 = /121, thus demonstrating the general result that both H and 
C are symmetric. 

For the special case where the data variances are independent of x , the diagonal terms of the latter, 
which correspond to the variances of the constant and the linear coefficient, are given by 
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The variance of the data points is 
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and if we choose the origin of the x-scale symmetrically, then the sum over x t vanishes so that we may 
write the above in the simplified form 


C a 
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C22 = 


I a l 

N * 2 X 


These results agree with our intuition in that the variance of the constant term is just the variance of 
the mean of the data values, while the variance of the slope decreases from this as the spread of the data 
points increases. From Eq. (8), the variance of the value of the fitted function, in this case a straight line, 
becomes 


a y( x ) = °jk xj+k 2 

j=l k = 1 

= Cn + 2C12X + C22X 2 

= ^ (h 2 2 ~ 2h n x + h n x 2 ) 
2 For convenience, we are using a definition involving 1/N rather than l/(iV - 1). 
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This is the equation of a conic section, for which the characteristic is 


B 2 - 4 AC = 
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where h n = ' =1 (l/cr, 2 ) and A = ^ =1 (l /a 2 ) Y^iLi (tf/rf) ~ (Zt^ifcMl) • Letting l/a t - a, and 

Xi/cri = bi, in the latter, and applying the Cauchy-Schwarz inequality, 
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we see that B 2 —A AC > 0, so that the equation for o y {x) is that of a hyperbola. Similarly, the discriminant 
of the quadratic on the right-hand side of Eq. (12) is b 2 — 4ac = — (4/A) < 0, so that the roots of the 
equation o 2 (x) = 0 are complex, i.e. , the hyperbola is symmetric about the x-axis. The minimum value 
of o 2 (x) occurs at x min = h^/hn, for which the variance is cf 2 \ m i n = 1 /hn- For constant data variances, 

<Ti = o, Xmin = (1/N) YliLi Xi = x, and o 2 \ min = (1 /h n ) = (cr 2 /N), while at x = x ± a x , 


a 2 (x ±a x ) = — 
h li 


2 = 2 ^- 
N 


(13) 


showing that the mean value of o 2 {x) occurs at one standard deviation from the mean value of x. 

If the values of the matrix elements hjk given by Eq. (11) are substituted into Eq. (12) for the special 
case (Ji = <j, it may be transformed into the simple dimensionless form 

v 2 = i+e (i4) 


where rj = \Z~N({a y (x)]/a) ) ^ = ([x — x}ja x ). The universal error curve represented by Eq. (14) is plotted, 
along with the higher-degree polynomial error curves discussed below, in Fig. 5. 

B. More General Fits 

The case of a general polynomial fit may be treated by the same methods used above, but the algebra 
becomes progressively more difficult for .M > 2. However, one can see that for higher-degree polynomial 
fits, the general behavior illustrated by the straight-line case dealt with above is also found. For example, 
since the highest power present in the expansion for ( t 2 ( x ) is 2 (M — 1) while the fit itself is of degree 
M — 1, the equation for a y (x) corresponds to a double-branched function whose asymptotic behavior for 
\x -x| >> o x follows the same power law as the fit itself. Thus, a quadratic fit yields an error that 
increases asymptotically as the second power of |x — x|, a cubic as the third power, etc. The implication 
of this is that higher-degree fits become less reliable than ones of lower degree as one moves away from 
the center of the given data. 

It is shown in the Appendix that if the data variances are constant, and the data points are uniformly 
distributed along the x-axis, then the variance of the fitted function o 2 (x) is a symmetric function of x — x, 
i.e., o y (x + Ax) = o y (x — Ax). Furthermore, it is possible to express cr 2 (x) in the same dimensionless 
form as for the straight-line case for the higher-degree polynomial fits. Thus, for the general polynomial 
of degree M — 1, a set of universal, normalized error curves exists, the square of which is given by 
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where 


M-l 

v 2 = E 4y~ l) (N)e j 

j = o 


V 


a 



i— 1 




While the coefficients appearing in the above expression generally depend on the number of data points, 
N, they do not for the straight-line case. Furthermore, for values of N exceeding that required for a given 
fit by some reasonable number, say 5, they become essentially constant for polynomial fits of all degrees. 
These results are summarized in Table 1, which lists the coefficients including their limiting 

values for N — > oo, for straight-line, quadratic, and cubic fits, corresponding to M = 2, 3, and 4. The 
corresponding normalized standard deviations of the fit, r /, are plotted in Fig. 5. 

In the general case where the basis functions are not polynomials, the standard errors are not constant, 
or the data points are not uniformly distributed, one must use Eq. (8) to determine the behavior of the 
error as a function of the independent variable, so that this equation takes on fundamental importance in 
determining confidence limits resulting from a given fit to a given set of data. An example of the direct 
use of this equation is shown in Figs. 1 through 4, where a set of antenna aperture efficiency data has 
been fitted to a quadratic function. The figures illustrate the behavior of the standard error of the fit 
over a wide range of elevation angles, including those beyond the existing data. We have assumed from 
the outset that the data errors are normally distributed, and under these circumstances, it may be shown 
that the errors in the coefficients and, consequently, also in the fitting function y(x) are also normally 
distributed, so that Gaussian statistics may be used to determine these limits. 

As a final comment, it should be observed that the above analysis assumes that the fit is meaningful, 
i.e., that y 2 ~ 1, and this is crucial to a proper interpretation of the fitting errors. If the fitting function 
has been improperly chosen, one may not assume that the errors follow Eq. (8) and, equally important, 
if the data errors have been incorrectly estimated, one may not assume that the fit is meaningful even if 
xt ~ 1, since these errors appear directly in the expression for y 2 [Eq. (1)]. It is, therefore, of the utmost 
importance to have a reasonably accurate estimate of the actual data errors if one is to have reasonable 
estimates of the resulting fitting errors. 


VII. The Nonlinear Least-Squares Problem 

The linear, least-squares problem is characterized by Eqs. (2) and (3), which, respectively, express the 
fitting function y(x; a) as a linear and y 2 as a quadratic function of the coefficients aj, thus allowing an 
explicit solution to be obtained for the latter, as shown in Section IV. In the nonlinear case, no such 


122 



simple formulation is possible, and one is led to an iterative procedure for the solution that starts with 
an assumed solution vector a and evolves so as to produce values closer and closer to the value ag, which 
minimizes x 2 - The iteration is based on the assumption that if a is sufficiently close to ag, then y{x\ a) 
may still be expressed as a linear and x 2 as a quadratic function of the coefficients. This assumption 
rests on using Taylor expansions for these quantities and retaining only the leading terms. Thus, for a 
near ag, we may write 


M 


y{x-a) = y(x;ao)+Y^ 


dy(x; a) 


dcij 
J=1 J 


(dj &0j ) 


= y (x; ao) + (a - a 0 ) T d (x; ao) 


(15) 


where (d(x;ao)]^ =dj(x;ao) = (dy(x; a)/da.j)\ an . Similarly, 


M M 


X 2 (a) = Xmin + IJ2Y1 


d 2 X 2 (a) 

2 fa ti da i dak 


(dj (iQj ) {(Ik (L()k ) 


= Xmin + ( a - a o) H (ao) (a - a 0 ) 


(16) 


where Xmin = X 2 (ao), and 


, / x 1 ^ 2 X 2 ( a ) 

jk 0 2 da j dak 


N 






1_ dy{xu a) 
2 


daj 


dy(xi ; a) 


da k 


Note that the gradient term vanishes in the above expression for x 2 ( a ) since it is evaluated at the 
minimum. Also, the final expression for hjk{ ao) involves only first derivatives because of the assumed 
form for y(x; a) given by Eq. (15). If we now define 


, , 1 dy{xi\a) 

A-ij \&o) — o 

<7 i daj 


dij (ao ) 


then H(ao) = A r A just as in the linear case and, hence, the derivation leading up to Eq. (6) carries 
through just as before, giving us 


^(ao)=C(a 0 ) = H- 1 (a 0 ) 


Also, the derivation leading up to Eq. (8) follows through as before when dj(x\ ao) is substituted for 
Xj(x), giving the result 
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M M 

CTy( x;a 0 ) = ' 52 ^ 2 C :j k{ao)dj{x]ao)dk{x-,a 0 ) 

j = 1 fc=i 


= d(i;ao) T C(ao)d(a:;ao) 


as does that leading up to Eqs. (9) and (10). 
Thus, the results 


and 


1 a l( x i) _ M 

N 2 -^ a 2 - N 
i= 1 * 


l 

N 


M 




i= 1 


M 2 
N a 


continue to hold in the general, nonlinear case so that with the appropriate definitions for the quantities 
involved, the nonlinear and linear, least-squares problems lead to the same formal results. Note, however, 
that all of the quantities now explicitly depend on the solution vector, ag. 

Furthermore, within the assumptions made in writing down Eqs. (15) and (16), i.e. , assuming that 
the deviations 6a.j are not so large as to invalidate the linear and quadratic approximations involved, one 
may assume that all of the distributions are normal and, hence, that the usual confidence limits derived 
from such distributions hold. Thus, the limits for a confidence level of 68.3 percent are ±ax, those for 
95.4 percent ±2 ax, for 99.73 percent ±3<tx, etc., where X corresponds to either a or y. As an example 
of the application of the above theory to a specific case, the nonlinear fitting problem for the Gaussian 
function 


y(x\ a) = a-i exp 


' -(s ~ <*2) 2 

2aj 


is treated in the Appendix, where it is shown that the normalized variance of the fit for the case of 
constant data errors is given by 


r ? 2 = y~ <7 ‘ e ( 3 + Aa t ? 4 ) 

under the assumptions that the number of data points is reasonably large, say 20 or more, that they are 
uniformly and symmetrically distributed about the peak x = a?, and that they extend out to a distance 
of x — a .2 = ±2a3 or more. In the above expression, 


a 


and 
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where = (1//V) xf — (l/A^ 3 )('^ 2^. 1 x i) 2 , as before. This equation is plotted in Fig. 6 for a t = 2 /\f?> 

and 4 /\/ 3 , corresponding to L/a 3 = 4 and 8, where L is the full range of the data points. 

If the data errors, instead of being constant, are proportional to the value of the function, a x = a{x x ) oc 
y(xi\ a), then the normalized error of the fit is given by 


rj 1 = Aq 2) (N) + A™(N)e + 4 2) (A 0 £ 4 


where now 17 = s/N<r y (x)/o(x) and the only restriction on the data points is that they be uniformly 
and symmetrically distributed relative to the peak. The coefficients A^(N) appearing in the above 
expression are the same as those found for the quadratic polynomial fit, and their values, including those 
for N — > 00, are give in Table 1. The equation is plotted in Fig. 7 . 

The functional dependence of the parameter errors, o aj , on the data interval, L, and the number of 
points, N, is frequently of importance in determining an optimum sampling strategy, and this is derived 
in Section V of the Appendix, with the results for constant data errors plotted in Fig. 8. 


VIII. A Related Problem 

It is sometimes of interest to determine the variance of the difference of the fit at two points xi and 
X2,&y(xi,X2) — y(x i;a) —y(x 2ja). Thus, from Eq. ( 13 ), we have 

6 [Aj/(a: 1 ,X2)] = Sy{xi;a) - 6 y(x 2 \a) 

M 

= EVM) - d j (x2)]Sa j 
j = 1 

so that 

M M 

a|i 2 = ({<5 [Aj/(xi,x 2 )]} 2 ) = X] ^2( Sa 3 Sa k)(dij - d 2j){d\k ~ d 2k) 

j = 1 k= 1 
M M 

= 5Z ^2 Cjk{dij - d,2j)(dik - d.2k) 
j = l k= 1 

For the case of a polynomial expansion, dij = Xij — x{~ 1 , so 

M M 

4 1i2 = (^i _l - A' 1 ) _ x 2 _1 ) 

j= 1 >.=1 

and, hence, using the results of Section VI, we obtain 
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ct 1 I|2 = C 22 ( Xi - x 2 ) 2 = <J 2 2 (xi - x 2 ) 2 


for a straight-line fit. For constant data errors, 




1 

77 


^o(xi -X 2 ) 2 

at 


IX. Discussion 

In applying the above results, it is important to be aware that the variances of the fit obtained in 
Eq. (8) and related equations are valid only if one is reasonably confident of the fit itself. If the fitting 
function being considered does not accurately represent the data, then one must find one that does before 
attempting to assign appropriate errors, and the best test for this is the value of the reduced y 2 . In 
this connection, it is worth mentioning that while fits using ordinary polynomials ' are often used for 
convenience and simplicity, they seldom correspond to physical reality. They may approximate a given 
physical process reasonably well over a limited range of the independent variable x, but one should exercise 
care when attempting to go beyond this, because the hallmark of these functions is that they diverge for 
large absolute values of x, and the variances of the fit derived for the polynomials in the Appendix and 
discussed in Section VI offer a warning of this by doing likewise in the region beyond the existing data. 

Clearly, one is better off with a fitting function that reflects the underlying physics reasonably well, but 
of course it is not always possible to find such a function, especially if the system being studied has a large 
number of basic processes going on at the same time, some of which may be unknown to the observer. 
A good example is trying to determine the microwave spectrum of a complex astrophysical source of 
radiation, such as a galaxy, or even a planet, where not only is the physics not known with certainty, 
but also the data have typically been gathered by many different workers using different measuring 
systems with different accuracies at different frequencies. Under such circumstances, one cannot be 
sure that some hidden absorption or emission feature has not gone undetected because of a gap in the 
measurements and knowledge, over some small range of frequencies, possibly at one of the extremes 
of the existing measurements. A case in point is the HII region DR21, where early measurements by 
workers identified the basic radiation process as thermal bremsstrahlung and, on that basis, predicted its 
microwave spectrum out to frequencies of 100 GHz, unaware of the fact that the region was surrounded 
by a dust cloud that converted intense visible and ultraviolet radiation to near-infrared radiation, which 
resulted in significant departure from the predicted spectrum at frequencies as low as 85 GHz. The 
existing measurements of the time did not extend beyond 31.4 GHz, so considerable extrapolation was 
involved, even on a logarithmic scale. 

Spectral determinations are particularly treacherous, especially at the extremes of the data, because, 
in principle, the range of the independent variable, frequency, is open at both ends, thus precluding a 
knowledge of the asymptotic behavior. At the opposite extreme, however, one may be dealing with a 
relatively uncomplicated situation involving a finite and known range for the independent variable and a 
system that can reliably be modeled in a fairly simple manner. In such a case, it pays to determine the 
errors as accurately as one can, i.e., to make maximum use of the given data by applying the variance of 
the fit theory discussed above. 

The question of how best to determine the actual data errors is not an easy one. If the measuring 
system and the system under study are simple, it may be possible to determine these from first principles, 
combined with the performance data for the instrumentation used, and it might further be possible to 
check this by a series of measurements on a known system. Often, however, one does not have such a 
simple circumstance to deal with, nor does one have the time to make the requisite test measurements. 
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In this case, one might obtain a reasonable estimate of the data errors by breaking up the range of the 
independent variable into a small number of segments over each of which the functional dependence may be 
reasonably well approximated by a quadratic, and fitting each of these restricted data sets by a quadratic 
assuming a = 1. Then, xl(° = 1) = (1/^) - 2/(z«)] 2 and x 2 (<r) = (l/^ 2 ) E^i [Vi ~ 2/0g)] 2 « 1 

so that cr 2 w x 2 ( CT — !)• A plot of the resulting values of a as a function of the mean value of the 
independent variable for each segment would then provide an estimate for a(x), which would in turn 
permit an assignment of the appropriate values for the <7j’s for the final overall fit. In the final analysis, 
as always, common sense and experience must be the ultimate guide in determining how best to proceed. 
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Appendix 

Mathematical Details 


I. Symmetry Considerations in Polynomial Fits 

If the basis functions chosen are the ordinary polynomials and constant data errors are assumed, the 
curvature matrix elements are given by 


hjk — 


i 

_L v J + k ~ 2 

a 2 f- < 


Furthermore, if the data points are uniformly distributed, we may choose our y-axis so that ^2^ =l x{ +k = 0 
for j + k = odd integer. 3 Then, the curvature matrix has the general appearance 


"/ill 

0 

fil3 

o •••- 

0 

h 22 

0 

/l24 ' ' • 

hu 

0 

/t33 

0 ••• 

0 

h-24 

0 

/l 44 • • • 

. : 



. • _ 


where the symmetry has been explicitly indicated, i.e., hjk = h^j . The elements of the covariance matrix 
are given by 


Cjk = — x cofactor(/ijfc) 

where A = det H and use has been made of the symmetry of H. By an inductive process, one can show 
that if j + k is an odd integer, then the cofactor matrix will, for a given rank of the curvature matrix, 
have n + 1 rows of identical sequences of zero and nonzero elements, where the number of the latter in 
a given such row is n. Thus, if the rank of H is 4 or 5, n = 1, if it is 6 or 7, n = 2, etc. In view of this 
structure of the cofactor matrix, one may perform a Gauss elimination on these rows and thereby end up 
with one row consisting of nothing but zeroes, in consequence of which the corresponding cofactor and, 
hence, also the covariance matrix element, Cjk, vanish for j + k equal to an odd integer. The covariance 
matrix, therefore, has the same structure as the curvature matrix, so that Eq. (8) for the variance of the 
fit becomes cr 2 (x) = J2j=\ ICkli Cjfc x j+k ~ 2 for j + k = even integer. Hence, cr 2 (x) is an even function of 
x, i.e., the variance of the fit for polynomials is symmetric about the mean of the data points if these are 
uniformly distributed and the data errors are constant. 


II. Polynomial Fits for M = 2, 3, and 4 With Constant Data Errors and Uniformly 
Distributed Data Points 

A. Straight-Line Fits 

While straight-line fits have already been dealt with in Section VI, the simplicity of this case makes 
it a useful starting point for discussion of the above ideas. Thus, assuming constant data errors and 


3 The y-axis location affects the covariance matrix, but not the variance of the fit. 
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uniformly distributed data points, we may introduce the variable x' = x — x, where x = (1 /N) Y^i=\ x »> 
in terms of which the curvature and covariance matrices have the sparse form shown in Eq. (A-l) above. 
Then, for M = 2 , we have 


h n 0 
0 /l22 


so that A = det H = hu/122, and 


[f 0 1 

/ill 

_ IT 2 

'1 0 ‘ 

„ 1 

“ N 

0 4 

0 1 — 


^22 J 


L X -1 


where er 2 = (1 jN) — x ) 2 = ( 1 /iV) £ 3 ^ x[ 2 . Substituting into Eq. (8), we thus have 


"1 0 ' 


' 1 ‘ 

a 2 

/ > \ 2 ' 

„ 1 


u 

l + f — ] 

0 — 
L 


X f 

= 7f 

\ a x) 


This may be written in the normalized form 



(A- 2 ) 


where 77 = y/N([a y (x)]/a) and £ = ([x — x\/cr x ), in agreement with the results of Section VI. A [Eq. ( 14 )]. 

Note that while the above symmetric result has been obtained under the restrictive condition that 
the data points are uniformly distributed, no such assumption was made in the original derivation given 
in Section VI, i.e., as long as the data variances are constant, the data points can be distributed in any 
manner and the result of Eq. (A- 2 ) still holds. As we shall see in what follows, this is a special feature of 
the straight-line fit that does not carry over to polynomial fits of higher degree. 

B. Quadratic Fits 

The general form for the variance of the fit for the quadratic case is 


al(x) = [1 x' x' 2 ] 


On 0 
0 C 22 

C13 0 


C13 

0 

C33 



■ 1 ' 


x' 


x' 2 


where x' = x — x as before, and 


H = 


fill 

0 

hi 3 


0 

h 22 
0 


hi 3 

0 

/*33 


(A- 3 ) 


Inverting this, the matrix elements of C are 
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C n = 


h 33 

hnh,33 — h\ 


13 


C\3 = - 


h 


13 


hnh33 - h\ 


13 


<?22 = t — 


1 

h-22 


(A-4) 


C 33 


in 


hl\h,33 — /lj 3 


where /in' = N/a 2 , hi 3 = /i 22 = (1 /a 2 ) Y^=\ x ? ~ N(crl/a 2 ), and /133 = (1 /a 2 ) YliLi x' 4 . The last term 
may be evaluated by making use of the uniformity of the distribution of the points. Thus, designating 
the spacing between adjacent points by 6, we have 


N 


(N- 1)/2 


Y.<‘= 26 ‘ £ 


i=l 


n—1 


‘ 4 = W (Ar2-1)(3Ar2_7) 


where the last step has used the relation 


1 

fc 4 = — n(n + 1)(2 n + l)(3n 2 + 3n - 1) 

fc=i 


from [4, Eq. (0.121.4)], and for definiteness we have assumed an odd number of points . 4 Similarly, 
applying the relation Y?k= 1 = (l/ 6 )n(n + l)( 2 n + 1 ) to the expression for /i 22 above, we obtain the 

following relationship between the point spacing, 6, and the standard deviation of the data points, cr x : 


S 2 = 


12 

N 2 - 1 


a 


2 

X 


so that we may write /133 in the form 


^33 


3(31V 2 — 7) ( aj\ 
5(N 2 — 1) V o 2 ) 


Substituting these results into Eq. (A-3), we finally obtain 

7,2 = 4 (^- 4 ) C 3 ( 3iV2 - 7 ) - 6 ( iV2 + ^ + 5 (^ 2 - i )? 4 ] 
= A {2 \N) + A^(N)e + A {2 \N)^ 


4 The results obtained do not depend on whether N is even or odd. 
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where £ = ([x — x\/o x ), as before. In the limit as N — * oo, and this limit is approached quite rapidly as 
N increases, this reduces to the simple fourth-degree polynomial 

" 2 = 3 -^ + J « 4 


C. Cubic Fits 

For the cubic case, the variance of the fit has the form 


CTy(x) = [1 x' 2 x' 4 


x' 6 ] 


C\i 0 C13 

0 C22 0 

C13 0 C33 

0 c 24 0 


0 

C 24 

0 

C 44 . 


1 

~'2 


LX 


— C11 + (C22 + 2Ci3)x ' 2 + (2C24 + C3s)x 1 ' 1 4 - C 4 ' 4 x ' 6 
and the curvature matrix is of the form 


(A-5) 



'fin 

0 

fil 3 

0 - 

H = 

0 

fi -22 

0 

/l 24 

fil 3 

0 

fi -33 

0 


. 0 

fi *24 

0 

fi. 44 - 


Inverting this, we have 


Cn = 


^33 


c 13 


C22 


^24 


C33 


C44 


^11^33 — ^13 
fil3 

hn/133 — h 2 3 

h 44 

/I22/144 — fi 2 4 

fi -24 

^ 22^44 — ^24 

fill 

^11^33 — hj 3 
^22 

Ilo 2 ^44 fi^ 4 


where 
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h n = -y 


N_ 

2 


‘13 


1 N 


„/ 2 


i=l 


N-f 

<7^ 


*•24 


= ^ = iE ^ 4 


i=l 


3(3iV 2 - 7) gg 
5(iV 2 - 1) a 2 


/i 44 


1 w 


9(3AT 2 - 18N 2 
7(iV 2 - l ) 2 


+ 31 > W £l 

a 2 


and the last term has been evaluated with the use of the formula k 6 = (1/42 )n(n + l)(2n + l)(3n 4 

+ 6 n 3 — 3n + 1), taken from [4, Eq. (0.121.6)]. 

Substituting these results into Eq. (A-5), we obtain 



+ 4 3) c 


2 + 4 3 ) e 4 + a 


(3) 46 
6 S 


where the coefficients are given by 

(3) = 3(3N 2 -7) 

0 4 (N 2 - 4) 

(3) 5 9iV 4 - 12iV 2 - 61 

2 ~ 12 (N 2 - 4)(N 2 - 9) 

(3 ) 5 33iV 4 - 23 N 2 - 226 

4 “ 36 (N 2 - 4)(7V 2 - 9) 

4O) = US (N 2 - l ) 2 
6 108 ( N 2 - 4 ){N 2 - 9) 

thus giving us the limiting result 77 2 = (9/4) 4- (15/4)£ 2 — (55/12)£ 4 + (175/108)£ 6 as N — > 00 . 


III. Gaussian Fits 

A. Constant Data Errors 

The general Gaussian fitting function is given by 


y(x\ a) = ai exp 


-{x - a 2 ) 2 


2°l 


ai exp 


-t‘ 


where t = (x — a 2 )/a 3 . Thus, the required derivatives are 
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, , \ dy -t 1 
d ' {z) = ^ =exp ~ 


dy a x t -t 2 

M*) = = TT exp ^r 

aa2 03 2 

dy ait 2 - 1 2 

M x ) = -K- = — exp — 

003 03 2 


(A-6) 


For a = constant, hjk = Y 2 iLi = (l/<r 2 ) dj(xi)dk(xi), so the elements of the curvature 
matrix are given by 


=^E ex pH?) 

i= 1 
AT 

^12 = ~2 5Z 4 i ex P ( _t i) 

1=1 

^13 = -4^t 2e Xp(-t, 2 ) 

t=l 

^22 = ^5It l 2 exp(-t 1 2 ) 

i=l 

^23 = £*< exp (- ii ) 

i=l 

7133 = ^2 5Z*i eX P Hi) 


where r = 01/03. 


We see from this that unless the data points are located symmetrically relative to the peak of the 
Gaussian, all six matrix elements will be nonzero, and the resulting variance of the fit will be a function 
of the amount of offset. Therefore, we consider only the simplest, and in fact most common, case where 
such a symmetry exists, at least approximately, and furthermore assume that the data points are uniformly 
distributed, as in the polynomial case. Then /112 = /123 = 0, and the curvature and covariance matrices 
have the same form dealt with above [see Eq. (A-l)]. Thus, proceeding as in the polynomial case, we 
have 
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h\\ — — j (25o + 1) 
hu = \b 2 rS 2 
h 22 = -^6 2 r 2 S 2 

(7 Z 

h 33 = - 2 sys A 


(A-7) 


where S 0 = 1 )/ 2 exp (~i 2 6?),S 2 = 1)/2 i 2 exp(-i 2 6?), S 4 = 1 )/ 2 i 4 exp(-i 2 <5 t 2 ), and 6 1 2 = 

(12/[W 2 - l])(<r 2 /a 2 ) = (12/[7V 2 - l\)a 2 is the spacing between points, normalized with respect to the 
standard deviation of the Gaussian function, 03 , and a t — a x /a 3 . 

The above sums, which depend on <5 ( , cannot be expressed in closed form in the general case, but in 
the limit as N — > 00 and 6 t — > 0, they may be expressed in terms of integrals. Thus, if we further restrict 
ourselves to the most useful case where the data points extend out to at least two standard deviations of 
the Gaussian, i.e., L > 4 a 3 , where L is the full width of the interval of the data points along the x-axis, 
then we find 


lim 6 t So = ^ ^ 

N,o t —» 00 l 


lim 6 2 S 2 = ^ 
00 1 4 


4 

n,o t — >00 1 8 


lim (5 t 5 S 4 = 


(A- 8 ) 


From the general form of the covariance matrix for symmetric data points with M — 3 [see Eq. (A-3), 
for example] and the derivatives given by Eq. (A- 6 ), the variance of the fit becomes 


cr 2 (x) = d 7 (x)Cd(a;) — e 1 [l r( rt 2 ] 


C\\ 

0 

G 13 


0 

C 22 

0 


C 13 

0 

C33 



■ 1 


rt 


rt 2 


— e 1 [Cn + [C 22 r + 2Ci 3 )rt 2 + C^r 2 ^ 


Substituting Eqs. (A-7) and (A- 8 ) into Eq. (A-4), we finally obtain 


V 


2 



(3 + 4a t V) 


where (L 00 /a 3 ) = lim 7 V -. 0 o(i'/a 3 ) = ^mpj^ 00 (L/a x )a t = 2\/Z a t has been used. 
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B. Data Errors Proportional to y (x;a) 

When the dependent variable extends over a very large range, as in the case of the Gaussian function, it 
often happens that the measurement error scales with the measurement itself, at least in a piecewise sense, 
due to the changing of instrument ranges as the variable values change. Thus, letting <j t = 0y(x % ; a), the 
curvature matrix elements become 


hjk 


P 


N 

hz 


1=1 


dj{xi)d k (xj ) 
y 2 * (xi-a) 


and using Eq. (A-6) for the derivatives, we have 


in 


N 

(aiPY 2 


N 


hi2 = 


h 13 = 


h-2 2 = 


M 2 £ 




N 


(orf) 2 tr 




N 


(o./3) 2 tr 


£<? 


2 * 

h 23 = , T 0 s2 


h 33 


(a^) 2 ^ 


N 


(°^) 2 ‘ j 


(A-9) 


These are of the same form as was found for the case of the quadratic fit, so the derivation follows through 
in much the same way, giving the result 


»;<*> - 


4 2) (ao + aHaO(-1 +a™{N) 

<?t 


iW/ 


ay 


where we have again assumed symmetric data, and the coefficients that appear are the same ones appear- 
ing in the quadratic fit. This may be simplified by noting that a\0 = <nai/y(x x -,a) = ai<7(x)/y(x; a) = 
a(x)e t 2,/2 so that we finally obtain rj 2 = A^(N) + A^(N)£ 2 + A^(N)^ 4 , where tj — \/iV([aj / (i)]/[a(x)]) 
and t/ (7t = (x — x)/a x = £, in exact agreement with the expression for the quadratic fit. 

IV. Monte Carlo Simulations 

As a check on the above derivations, a series of Monte Carlo simulations has been carried out for the 
various cases treated by creating ensembles of data of the form y n i = y(x x \ a) + r n (xf), where r n (x,) 
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is a zero-mean, normally distributed random variable having variance ( 7 j, fitting the function y(x: a) to 
each of these to obtain an ensemble of fitted functions y(x; a„), and computing the variance er 2 (:r m ) = 

(1/N e ) J2n=i y 2 ( x rn;a-n) - [(l/iV e ) Y,n=i v( x m ; a „)] 2 for a series of values of x m , where N e is the number 
of ensemble members, so that comparisons can be made with the theoretical results. These are shown in 
Figs. 5 through 7, and it can be seen that the agreement is excellent. 


V. Parameter Errors for Gaussian Fitting Functions 

The results obtained above permit the calculation of standard errors for the parameters a 3 appearing 
in the various cases treated, i.e., cr 2 = C,, . However, since these errors will depend on the location of 
the y-axis, the results are not of general interest unless the data are located symmetrically relative to 
the fitting function, as, for example, in the case of the Gaussian function. This function is of sufficient 
interest to warrant a separate discussion, which we present below for the two cases considered earlier, 
namely, constant data errors and data errors proportional to the function. 


A. Constant Data Errors 

When Eq. (A-7) is substituted into Eq. (A-4), the following expressions are obtained for the normalized 
standard errors: 


i ulSi = 

o 2 (25 0 + 1 )5 4 - 2 5 2 


1 N 

{a/atf 2 6 2 S 2 


A >q> 3) 2 1 N(2Sq 4- 1) 

(a/ax ) 2 2 [(25b + 1) - 2 5 2 ] 


where the sums So, S 2 , and 54 are given by Eq. (A-7), and = (12/[N 2 — lJXo^/a^) = ( 12/[N 2 — l])cr^ , 
as before. These results are valid for any odd value of N > 3, and in Fig. 8 we show the general behavior 
as a function of the variable L/a 3 = (TV — l)b t for TV =3, 5, 7, 9, and 11. In the limit TV — > 00 and 
L/a 3 > 4, the above normalized standard errors become 


a 2 

3 L 

TV -2f 


o 2 

2^/tt a 3 

(o-a 2 /a 3 ) 2 

2 L 

(a/ax ) 2 

aA a 3 

{cr a Ja 3 ) 2 

2 L 

(cx/ai ) 2 

\/tt a 3 


B. Proportional Data Errors 

Substituting Eq. (A-9) into Eq. (A-4), we obtain the following normalized standard errors: 
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N °\ = P 2 A i „ ) (N) = 0 1 


3(37V 2 - 7) 
4(iV 2 - 4) 



as N — > oo; 



£ 

^2 


independent of AT; and 


*4 

aq 


= ~A\n) 

°t 


p 2 5(iV 2 — 1) 
c} 4(N 2 — 4) 


5 

4 <7 4 


as N — + oo, where /3 is the error proportionality constant, i.e., cr, = Py{x l \ a). 



